{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###### Content under Creative Commons Attribution license CC-BY 4.0, code under BSD 3-Clause License © 2018  by D. Koehn, notebook style sheet by L.A. Barba, N.C. Clementi"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<link href=\"https://fonts.googleapis.com/css?family=Merriweather:300,300i,400,400i,700,700i,900,900i\" rel='stylesheet' >\n",
       "<link href=\"https://fonts.googleapis.com/css?family=Source+Sans+Pro:300,300i,400,400i,700,700i\" rel='stylesheet' >\n",
       "<link href='http://fonts.googleapis.com/css?family=Source+Code+Pro:300,400' rel='stylesheet' >\n",
       "<style>\n",
       "\n",
       "@font-face {\n",
       "    font-family: \"Computer Modern\";\n",
       "    src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');\n",
       "}\n",
       "\n",
       "\n",
       "#notebook_panel { /* main background */\n",
       "    background: rgb(245,245,245);\n",
       "}\n",
       "\n",
       "div.cell { /* set cell width */\n",
       "    width: 800px;\n",
       "}\n",
       "\n",
       "div #notebook { /* centre the content */\n",
       "    background: #fff; /* white background for content */\n",
       "    width: 1000px;\n",
       "    margin: auto;\n",
       "    padding-left: 0em;\n",
       "}\n",
       "\n",
       "#notebook li { /* More space between bullet points */\n",
       "margin-top:0.5em;\n",
       "}\n",
       "\n",
       "/* draw border around running cells */\n",
       "div.cell.border-box-sizing.code_cell.running { \n",
       "    border: 1px solid #111;\n",
       "}\n",
       "\n",
       "/* Put a solid color box around each cell and its output, visually linking them*/\n",
       "div.cell.code_cell {\n",
       "    background-color: rgb(256,256,256); \n",
       "    border-radius: 0px; \n",
       "    padding: 0.5em;\n",
       "    margin-left:1em;\n",
       "    margin-top: 1em;\n",
       "}\n",
       "\n",
       "\n",
       "div.text_cell_render{\n",
       "    font-family: 'Source Sans Pro', sans-serif;\n",
       "    line-height: 140%;\n",
       "    font-size: 110%;\n",
       "    width:680px;\n",
       "    margin-left:auto;\n",
       "    margin-right:auto;\n",
       "}\n",
       "\n",
       "/* Formatting for header cells */\n",
       ".text_cell_render h1 {\n",
       "    font-family: 'Merriweather', serif;\n",
       "    font-style:regular;\n",
       "    font-weight: bold;    \n",
       "    font-size: 250%;\n",
       "    line-height: 100%;\n",
       "    color: #004065;\n",
       "    margin-bottom: 1em;\n",
       "    margin-top: 0.5em;\n",
       "    display: block;\n",
       "}\t\n",
       ".text_cell_render h2 {\n",
       "    font-family: 'Merriweather', serif;\n",
       "    font-weight: bold; \n",
       "    font-size: 180%;\n",
       "    line-height: 100%;\n",
       "    color: #0096d6;\n",
       "    margin-bottom: 0.5em;\n",
       "    margin-top: 0.5em;\n",
       "    display: block;\n",
       "}\t\n",
       "\n",
       ".text_cell_render h3 {\n",
       "    font-family: 'Merriweather', serif;\n",
       "\tfont-size: 150%;\n",
       "    margin-top:12px;\n",
       "    margin-bottom: 3px;\n",
       "    font-style: regular;\n",
       "    color: #008367;\n",
       "}\n",
       "\n",
       ".text_cell_render h4 {    /*Use this for captions*/\n",
       "    font-family: 'Merriweather', serif;\n",
       "    font-weight: 300; \n",
       "    font-size: 100%;\n",
       "    line-height: 120%;\n",
       "    text-align: left;\n",
       "    width:500px;\n",
       "    margin-top: 1em;\n",
       "    margin-bottom: 2em;\n",
       "    margin-left: 80pt;\n",
       "    font-style: regular;\n",
       "}\n",
       "\n",
       ".text_cell_render h5 {  /*Use this for small titles*/\n",
       "    font-family: 'Source Sans Pro', sans-serif;\n",
       "    font-weight: regular;\n",
       "    font-size: 130%;\n",
       "    color: #e31937;\n",
       "    font-style: italic;\n",
       "    margin-bottom: .5em;\n",
       "    margin-top: 1em;\n",
       "    display: block;\n",
       "}\n",
       "\n",
       ".text_cell_render h6 { /*use this for copyright note*/\n",
       "    font-family: 'Source Code Pro', sans-serif;\n",
       "    font-weight: 300;\n",
       "    font-size: 9pt;\n",
       "    line-height: 100%;\n",
       "    color: grey;\n",
       "    margin-bottom: 1px;\n",
       "    margin-top: 1px;\n",
       "}\n",
       "\n",
       "    .CodeMirror{\n",
       "            font-family: \"Source Code Pro\";\n",
       "\t\t\tfont-size: 90%;\n",
       "    }\n",
       "/*    .prompt{\n",
       "        display: None;\n",
       "    }*/\n",
       "\t\n",
       "    \n",
       "    .warning{\n",
       "        color: rgb( 240, 20, 20 )\n",
       "        }  \n",
       "</style>\n",
       "<script>\n",
       "    MathJax.Hub.Config({\n",
       "                        TeX: {\n",
       "                           extensions: [\"AMSmath.js\"], \n",
       "                           equationNumbers: { autoNumber: \"AMS\", useLabelIds: true}\n",
       "                           },\n",
       "                tex2jax: {\n",
       "                    inlineMath: [ ['$','$'], [\"\\\\(\",\"\\\\)\"] ],\n",
       "                    displayMath: [ ['$$','$$'], [\"\\\\[\",\"\\\\]\"] ]\n",
       "                },\n",
       "                displayAlign: 'center', // Change this to 'center' to center equations.\n",
       "                \"HTML-CSS\": {\n",
       "                    styles: {'.MathJax_Display': {\"margin\": 4}}\n",
       "                }\n",
       "        });\n",
       "</script>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Execute this cell to load the notebook's style sheet, then ignore it\n",
    "from IPython.core.display import HTML\n",
    "css_file = '../../style/custom.css'\n",
    "HTML(open(css_file, \"r\").read())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Performance optimization of the 2D acoustic finite difference modelling code\n",
    "\n",
    "During the [last class](http://nbviewer.jupyter.org/github/daniel-koehn/Theory-of-seismic-waves-II/blob/master/05_2D_acoustic_FD_modelling/1_From_1D_to_2D_acoustic_FD_modelling_final.ipynb), it took us only 15 minutes to develop a 2D acoustic FD code based on the 1D code. However, with a runtime of roughly 3 minutes, the performance of this \"vanilla\" Python implementation was quite underwhelming. Therefore, the aim of this lesson is to optimize the performance of this code. \n",
    "\n",
    "Let's start with a slightly modified version of the original code. Basically, I moved the computation of the analytical solution outside the main code, the discretization parameters $nx,\\; nz,\\; nt,\\; dx,\\; dz,\\; dt$ are also fixed in order to minimize the input to the FD modelling function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "code_folding": [
     0
    ]
   },
   "outputs": [],
   "source": [
    "# Import Libraries \n",
    "# ----------------\n",
    "import numpy as np\n",
    "import matplotlib\n",
    "import matplotlib.pyplot as plt\n",
    "from pylab import rcParams\n",
    "\n",
    "# Ignore Warning Messages\n",
    "# -----------------------\n",
    "import warnings\n",
    "warnings.filterwarnings(\"ignore\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "code_folding": [],
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# Definition of modelling parameters\n",
    "# ----------------------------------\n",
    "xmax = 500.0 # maximum spatial extension of the 1D model in x-direction (m)\n",
    "zmax = xmax  # maximum spatial extension of the 1D model in z-direction(m)\n",
    "dx   = 1.0   # grid point distance in x-direction\n",
    "dz   = dx    # grid point distance in z-direction\n",
    "\n",
    "tmax = 0.502   # maximum recording time of the seismogram (s)\n",
    "dt   = 0.0010  # time step\n",
    "\n",
    "vp0  = 580.   # P-wave speed in medium (m/s)\n",
    "\n",
    "# acquisition geometry\n",
    "xr = 330.0 # x-receiver position (m)\n",
    "zr = xr    # z-receiver position (m)\n",
    "\n",
    "xsrc = 250.0 # x-source position (m)\n",
    "zsrc = 250.0 # z-source position (m)\n",
    "\n",
    "f0   = 40. # dominant frequency of the source (Hz)\n",
    "t0   = 4. / f0 # source time shift (s)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define model discretization\n",
    "# ---------------------------\n",
    "\n",
    "nx = (int)(xmax/dx) # number of grid points in x-direction\n",
    "print('nx = ',nx)\n",
    "\n",
    "nz = (int)(zmax/dz) # number of grid points in x-direction\n",
    "print('nz = ',nz)\n",
    "\n",
    "nt = (int)(tmax/dt) # maximum number of time steps            \n",
    "print('nt = ',nt)\n",
    "\n",
    "ir = (int)(xr/dx)      # receiver location in grid in x-direction    \n",
    "jr = (int)(zr/dz)      # receiver location in grid in z-direction\n",
    "\n",
    "isrc = (int)(xsrc/dx)  # source location in grid in x-direction\n",
    "jsrc = (int)(zsrc/dz)  # source location in grid in x-direction\n",
    "\n",
    "# Source time function (Gaussian)\n",
    "# -------------------------------\n",
    "src  = np.zeros(nt + 1)\n",
    "time = np.linspace(0 * dt, nt * dt, nt)\n",
    "\n",
    "# 1st derivative of a Gaussian\n",
    "src  = -2. * (time - t0) * (f0 ** 2) * (np.exp(- (f0 ** 2) * (time - t0) ** 2))\n",
    "\n",
    "# Analytical solution\n",
    "# -------------------\n",
    "G    = time * 0.\n",
    "\n",
    "# Initialize coordinates\n",
    "# ----------------------\n",
    "x    = np.arange(nx)\n",
    "x    = x * dx       # coordinates in x-direction (m)\n",
    "\n",
    "z    = np.arange(nz)\n",
    "z    = z * dz       # coordinates in z-direction (m)\n",
    "\n",
    "# calculate source-receiver distance\n",
    "r = np.sqrt((x[ir] - x[isrc])**2 + (z[jr] - z[jsrc])**2)\n",
    "\n",
    "for it in range(nt): # Calculate Green's function (Heaviside function)\n",
    "    if (time[it] - r / vp0) >= 0:\n",
    "        G[it] = 1. / (2 * np.pi * vp0**2) * (1. / np.sqrt(time[it]**2 - (r/vp0)**2))\n",
    "Gc   = np.convolve(G, src * dt)\n",
    "Gc   = Gc[0:nt]\n",
    "lim  = Gc.max() # get limit value from the maximum amplitude\n",
    "\n",
    "# Initialize model (assume homogeneous model)\n",
    "# -------------------------------------------\n",
    "vp    = np.zeros((nx,nz))\n",
    "vp2    = np.zeros((nx,nz))\n",
    "\n",
    "vp  = vp + vp0       # initialize wave velocity in model\n",
    "vp2 = vp**2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "code_folding": [
     42
    ]
   },
   "outputs": [],
   "source": [
    "# 2D Wave Propagation (Finite Difference Solution) \n",
    "# ------------------------------------------------\n",
    "def FD_2D_acoustic_vanilla():        \n",
    "    \n",
    "    # Initialize empty pressure arrays\n",
    "    # --------------------------------\n",
    "    p    = np.zeros((nx,nz)) # p at time n (now)\n",
    "    pold = np.zeros((nx,nz)) # p at time n-1 (past)\n",
    "    pnew = np.zeros((nx,nz)) # p at time n+1 (present)\n",
    "    d2px = np.zeros((nx,nz)) # 2nd spatial x-derivative of p\n",
    "    d2pz = np.zeros((nx,nz)) # 2nd spatial z-derivative of p\n",
    "\n",
    "    # Initialize empty seismogram\n",
    "    # ---------------------------\n",
    "    seis = np.zeros(nt) \n",
    "    \n",
    "    # Calculate Partial Derivatives\n",
    "    # -----------------------------\n",
    "    for it in range(nt):\n",
    "    \n",
    "        # FD approximation of spatial derivative by 3 point operator\n",
    "        for i in range(1, nx - 1):\n",
    "            for j in range(1, nz - 1):\n",
    "                \n",
    "                d2px[i,j] = (p[i + 1,j] - 2 * p[i,j] + p[i - 1,j]) / dx ** 2                \n",
    "                d2pz[i,j] = (p[i,j + 1] - 2 * p[i,j] + p[i,j - 1]) / dz ** 2\n",
    "\n",
    "        # Time Extrapolation\n",
    "        # ------------------\n",
    "        pnew = 2 * p - pold + vp ** 2 * dt ** 2 * (d2px + d2pz)\n",
    "\n",
    "        # Add Source Term at isrc\n",
    "        # -----------------------\n",
    "        # Absolute pressure w.r.t analytical solution\n",
    "        pnew[isrc,jsrc] = pnew[isrc,jsrc] + src[it] / (dx * dz) * dt ** 2\n",
    "                \n",
    "        # Remap Time Levels\n",
    "        # -----------------\n",
    "        pold, p = p, pnew\n",
    "    \n",
    "        # Output of Seismogram\n",
    "        # -----------------\n",
    "        seis[it] = p[ir,jr]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You know what happened the last time, we executed the cell below. We had to wait 3 minutes until the modelling run finished. So for safety reasons I commented the code execution and defined the runtime. You should adapt the value of the timing measurement `t_vanilla_python` by the value of your computer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#%%time\n",
    "#FD_2D_acoustic_vanilla()\n",
    "t_vanilla_python = 190.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Just-In-Time (JIT) code compilation with Numba \n",
    "\n",
    "The poor performance of the vanilla Python code is due to the nested FOR loops to compute the 2nd spatial FD derivatives. We can optimize the performance using the `Numba ` library for Python which turns Python functions into C-style compiled functions using [LLVM](https://en.wikipedia.org/wiki/LLVM). A nice introduction to Numba was presented at the SciPy conference 2016 by Gil Forsyth & Lorena Barba with the title \n",
    "\n",
    "**Numba: Tell those C++ bullies to get lost**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.display import YouTubeVideo\n",
    "YouTubeVideo('SzBi3xdEF2Y')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The associated Jupyter notebooks can be cloned from [here](https://github.com/barbagroup/numba_tutorial_scipy2016).\n",
    "\n",
    "First, we have to install Numba, which is quite easy using Anaconda:\n",
    "\n",
    "`conda install numba` \n",
    "\n",
    "From the Numba library we import **jit**: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import JIT from Numba\n",
    "from numba import jit"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The only thing, we modify in our original Python code is to add the function decorator \n",
    "\n",
    "`@jit(nopython=True)`\n",
    "\n",
    "which tags the function `FD_2D_acoustic_JIT` to be compiled:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's run the code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "t_JIT_python =  # runtime of JIT compiled Python code (s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another approach to get rid of the nested FOR-loops is to use Numpy array operations:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "seis_FD_numpy = FD_2D_acoustic_numpy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "t_numpy_python =   # runtime of JIT compiled Python code (s)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "seis_FD_numpy_JIT = FD_2D_acoustic_numpy_JIT()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "t_numpy_python_JIT =  # runtime of JIT compiled Python code (s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So JIT could also improve the performance of the code using `NumPy` array operations, but the performance of the compiled code with the nested FOR loops has a slight edge in terms of performance. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Comparison with a C++ implementation\n",
    "\n",
    "How does the performance of the JIT-codes compare to a C++ bully code? I invested 1 hour to write [this C++ code](https://github.com/daniel-koehn/Theory-of-seismic-waves-II/tree/master/05_2D_acoustic_FD_modelling/cxx/2dac.cpp), which is similar to the 2D acoustic FD Python code. \n",
    "\n",
    "In order to use similar matrix data structures in C++ as in Python, I use the `Eigen` library:\n",
    "\n",
    "www.eigen.tuxfamily.org/\n",
    "\n",
    "which also allows auto-vectorization of matrix-matrix products. To compile the source code, you need a C++ compiler, e.g. `g++` and the `Eigen` library which can either be compiled from source or installed using the package manager of your Linux distribution. \n",
    "\n",
    "I also recommend to use the moderate optimization option `-O2` and Advanced Vector Extensions ([AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions)) `-mavx` during code compilation for a significant performance increase of the code. Let's compile and run the code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compile and run C++-version\n",
    "\n",
    "\n",
    "# load seismogram\n",
    "time_Cpp, seis_FD_Cpp = np.loadtxt('seis.dat', delimiter='\\t', skiprows=0, unpack=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "t_cxx =  # runtime of C++ code (s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The C++ code performance is comparable with the JIT version of the Python code using `NumPy` operations, which is quite impressive considering the simple Python code optimization using JIT.\n",
    "\n",
    "To check if the optimized codes are not only fast but still produce reasonable modelling results, it is a good idea to check if the seismograms of the optimized codes still coincide with the analytical solution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compare FD Seismogram with analytical solution\n",
    "# ----------------------------------------------\n",
    "# Define figure size\n",
    "rcParams['figure.figsize'] = 12, 5\n",
    "plt.plot(time, seis_FD_JIT, 'b-',lw=3,label=\"FD solution (Python + JIT)\") # plot FD seismogram\n",
    "plt.plot(time, seis_FD_numpy, 'g-',lw=3,label=\"FD solution (Python + NumPy)\") # plot FD seismogram\n",
    "plt.plot(time, seis_FD_numpy_JIT, 'k-',lw=3,label=\"FD solution (Python + NumPy + JIT)\") # plot FD seismogram\n",
    "plt.plot(time_Cpp, seis_FD_Cpp, 'y-',lw=3,label=\"FD solution (C++)\") # plot FD seismogram\n",
    "Analy_seis = plt.plot(time,Gc,'r--',lw=3,label=\"Analytical solution\") # plot analytical solution\n",
    "plt.xlim(time[0], time[-1])\n",
    "plt.title('Seismogram')\n",
    "plt.xlabel('Time (s)')\n",
    "plt.ylabel('Amplitude')\n",
    "plt.legend()\n",
    "plt.grid()\n",
    "plt.show() "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we produce some nice bar charts to compare the performance of the different codes developed in this `Jupyter` notebook:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# define codes\n",
    "codes = ('Python', 'Python + JIT', 'Python + NumPy', 'Python + NumPy + JIT', 'C++')\n",
    "y_pos = np.arange(len(codes))\n",
    "\n",
    "# runtime\n",
    "performance = [t_vanilla_python,t_JIT_python,t_numpy_python,t_numpy_python_JIT,t_cxx]\n",
    "\n",
    "# speed-up with respect to the non-optimized code\n",
    "speedup = [t_vanilla_python/t_vanilla_python,\n",
    "           t_vanilla_python/t_JIT_python,\n",
    "           t_vanilla_python/t_numpy_python,\n",
    "           t_vanilla_python/t_numpy_python_JIT,\n",
    "           t_vanilla_python/t_cxx]\n",
    "\n",
    "# Define figure size\n",
    "rcParams['figure.figsize'] = 12, 8\n",
    "\n",
    "# Plot runtimes of 2D acoustic FD codes\n",
    "ax1 = plt.subplot(211)\n",
    "\n",
    "plt.bar(y_pos, performance, align='center', alpha=0.5)\n",
    "plt.xticks(y_pos, codes)\n",
    "plt.ylabel('Runtime (s)')\n",
    "plt.title('Performance comparison of 2D acoustic FD modelling codes')\n",
    " \n",
    "# make tick labels invisible\n",
    "plt.setp(ax1.get_xticklabels(), visible=False)   \n",
    "\n",
    "# Plot speedup of 2D acoustic FD codes\n",
    "ax2 = plt.subplot(212, sharex=ax1)    \n",
    "\n",
    "plt.bar(y_pos, speedup, align='center', alpha=0.5,color='r')\n",
    "plt.xticks(y_pos, codes)\n",
    "plt.ylabel('Speedup ()')\n",
    "plt.tight_layout() \n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Is this the best result we can achieve or are further code improvements possible? \n",
    "\n",
    "Using domain decomposition with the **Message-Passing Interface MPI** to distribute the workload over multiple CPU cores, combined with a partioning of the tasks in each domain using **Multithreading** can significantly improve the code performance. One key is the manual optimization of CPU and GPU kernels, especially regarding memory access times or communication between MPI processes. As an example I plotted the runtime and speedup for the same homogeneous acoustic problem from this Jupyter notebook using the 2D acoustic modelling code [DENISE Black-Edition](https://github.com/daniel-koehn/DENISE-Black-Edition) which only relies on MPI:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define figure size\n",
    "rcParams['figure.figsize'] = 12, 6\n",
    "\n",
    "# number of cores and runtime\n",
    "cores = np.array([1, 2, 4, 8, 16, 25])\n",
    "t_denise = np.array([0.926, 0.482, 0.234, 0.123, 0.067, 0.055])\n",
    "\n",
    "# speed-up with respect to the runtime of the 1st core\n",
    "# and linear speedup\n",
    "speedup_denise = t_denise[0] / t_denise\n",
    "linear_speedup = cores\n",
    "\n",
    "# plot runtime\n",
    "ax2 = plt.subplot(121)\n",
    "plt.plot(cores, t_denise, 'rs-',lw=3,label=\"Runtime\")\n",
    "plt.title('Runtime DENISE Black-Edition')\n",
    "plt.xlabel('Number of cores')\n",
    "plt.ylabel('Runtime (s)')\n",
    "plt.legend()\n",
    "plt.grid()\n",
    "\n",
    "# plot speedup\n",
    "ax2 = plt.subplot(122)\n",
    "plt.plot(cores, speedup_denise, 'bs-',lw=3,label=\"Speedup DENISE\")\n",
    "plt.plot(cores, linear_speedup, 'k-',lw=3,label=\"Linear speedup\")\n",
    "plt.title('Speedup DENISE Black-Edition')\n",
    "plt.xlabel('Number of cores')\n",
    "plt.ylabel('Speedup')\n",
    "plt.legend()\n",
    "plt.grid()\n",
    "\n",
    "plt.show() "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using less than 2 cores, the JIT compiled Python code with a runtime of 353 ms is faster than the MPI code. Utilizing more cores, the DENISE code leads to a steady runtime decrease. However, notice that the speedup is not linear anymore when using 16 cores or more. This can be explained by excessive communication time between the MPI processes, when the domain sizes decreases. More details about MPI and Multithreading optimizations are beyond the scope of the TEW2 course, but will be the topic of a future HPC lecture ...\n",
    "\n",
    "To get an idea about the difference between JIT optimized Python codes and manually optimized codes, I recommend a SciPy 2016 talk by Andreas Klöckner:\n",
    "\n",
    "**High Performance with Python: Architectures, Approaches & Applications**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.display import YouTubeVideo\n",
    "YouTubeVideo('Zz_6P5qAJck')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What we learned:\n",
    "\n",
    "* The performance of our 2D acoustic FD modelling code, developed in the previous class, suffered from the nested FOR loops\n",
    "* Using JIT compilation of the Python code using `Numba`, the performance could be significantly improved by a factor XXx\n",
    "* Alternatively, we can replace the nested FOR loops by `NumPy` array operations to improve the runtime performance\n",
    "* The performance of the JIT compiled Python code is comparable with a C++ implementation\n",
    "* Check if the modelling results of your optimized Python codes are still correct"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
